OpenAI's AI transcription tool hallucinates excessively – here's a better alternative


oxygen/Getty Images

OpenAI’s Whisper, an artificial intelligence (AI) speech recognition and transcription tool launched in 2022, has been found to hallucinate or make things up — so much so that experts are worried it could cause serious damage in the wrong context.

Last week, the AP reported that a researcher at the University of Michigan “found hallucinations in eight out of every 10 audio transcriptions he inspected” produced by Whisper during a study of public meetings. 

Also: How Claude’s new AI data analysis tool compares to ChatGPT’s version (hint: it doesn’t)

The data point is one of many: separately, an engineer who reviewed 100 hours of Whisper transcriptions told the AP that he found hallucinations in roughly 50% of them, while another developer discovered hallucinations in almost every transcript he generated using Whisper, which totals 26,000. 

While users can always expect AI transcribers to get a word or spelling wrong here and there, researchers noted that they “had never seen another AI-powered transcription tool hallucinate as much as Whisper.”

OpenAI says Whisper, an open-source neural net, “approaches human level robustness and accuracy on English speech recognition.” It is integrated widely across several industries for common types of speech recognition, including transcribing and translating interviews and creating video subtitles. 

Also: Police are using AI to write crime reports. What could go wrong?

That level of ubiquity could quickly spread fabricated text, misattributed and invented quotes, and other misinformation across several mediums, which can vary in significance based on the nature of the original material. According to AP, Whisper is incorporated into some versions of ChatGPT, built into call centers, voice assistants, and cloud platforms from Oracle and Microsoft, and it was downloaded more than 4.2 million times last month from HuggingFace. 

What’s even more concerning, experts told the AP, is that medical professionals are increasingly using “Whisper-based tools” to transcribe patient-doctor consultations. The AP interviewed more than 12 engineers, researchers, and developers who confirmed that Whisper fabricated phrases and full sentences in transcription text, some of which “can include racial commentary, violent rhetoric and even imagined medical treatments.”

Also: How AI hallucinations could help create life-saving antibiotics

“Nobody wants a misdiagnosis,” said Alondra Nelson, a professor at the Institute for Advanced Study. 

OpenAI may not have advocated for medical use cases — the company advises “against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes” — but putting the tool on the market and touting its accuracy means it’s likely to be picked up by several industries trying to expedite work and create efficiencies wherever possible, regardless of the possible risks. 

The issue doesn’t seem dependent on longer or poorly recorded audio, either. According to the AP, computer scientists recently found some hallucinations in short, clear audio samples. Researchers told the AP the trend “would lead to tens of thousands of faulty transcriptions over millions of recordings.”

“The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper’s hallucinations in their work,” the AP reports. Besides, as Christian Vogler, who directs Gallaudet University’s Technology Access Program and is deaf, pointed out, those who are deaf or hard of hearing can’t catch hallucinations “hidden amongst all this other text.” 

The researchers’ findings indicate a broader problem in the AI industry: tools are brought to market too quickly for the sake of profit, especially while the US still lacks proper AI regulations. This is also relevant considering OpenAI’s ongoing for-vs.-non-profit debate and recent predictions from leadership that don’t consider AI risks.  

Also: Could AI make data science obsolete?

“An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers’ findings, adding that OpenAI incorporates feedback in model updates,” AP wrote. 

While you’re waiting for OpenAI to resolve the issue, we recommend trying Otter.ai, a journalist-trusted AI transcription tool that just added six new languages. Last month, one longtime Otter.ai user noted that a new AI summary feature in the platform hallucinated a statistic, but that error wasn’t in the transcription itself. It may be wise to not rely on that feature, especially as risks can increase when AI is asked to summarize bigger contexts. 

Otter.ai’s own guidance for transcription doesn’t mention hallucinations, only that “accuracy can vary based on factors such as background noise, speaker accents, and the complexity of the conversation,” and advises users to “review and edit the transcriptions to ensure complete accuracy, especially for critical tasks or important conversations.”

Also: iOS 18.1 with Apple Intelligence is here. Try these 5 AI features first

If you have an iPhone, the new iOS 18.1 with Apple Intelligence now allows AI call recording and transcription, but ZDNET’s editor-in-chief Jason Hiner says it’s “still a work in progress.” 

Meanwhile, OpenAI just announced plans to give its 250 million ChatGPT Plus users more tools





Source link